Beyond accuracy: creating interoperable and scalable text-mining web services

نویسندگان

  • Chih-Hsuan Wei
  • Robert Leaman
  • Zhiyong Lu
چکیده

UNLABELLED The biomedical literature is a knowledge-rich resource and an important foundation for future research. With over 24 million articles in PubMed and an increasing growth rate, research in automated text processing is becoming increasingly important. We report here our recently developed web-based text mining services for biomedical concept recognition and normalization. Unlike most text-mining software tools, our web services integrate several state-of-the-art entity tagging systems (DNorm, GNormPlus, SR4GN, tmChem and tmVar) and offer a batch-processing mode able to process arbitrary text input (e.g. scholarly publications, patents and medical records) in multiple formats (e.g. BioC). We support multiple standards to make our service interoperable and allow simpler integration with other text-processing pipelines. To maximize scalability, we have preprocessed all PubMed articles, and use a computer cluster for processing large requests of arbitrary text. AVAILABILITY AND IMPLEMENTATION Our text-mining web service is freely available at http://www.ncbi.nlm.nih.gov/CBBresearch/Lu/Demo/tmTools/#curl CONTACT : [email protected].

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Service-Oriented Data Mining

A service is a software building block capable of fulfilling a given task or a distinct business function through a well-defined interface, loosely-coupled interface. Services are like "black boxes", since they operate independently within the system, external components are not aware of how they perform their function, they only care that they return the expected result. The Service Oriented A...

متن کامل

How to build a WebFountain: An architecture for very large-scale text analytics

WebFountain is a platform for very large-scale text analytics applications. The platform allows uniform access to a wide variety of sources, scalable system-managed deployment of a variety of document-level “augmenters” and corpus-level “miners,” and finally creation of an extensible set of hosted Web services containing information that drives end-user applications. Analytical components can b...

متن کامل

Processing biological literature with customizable Web services supporting interoperable formats

Web services have become a popular means of interconnecting solutions for processing a body of scientific literature. This has fuelled research on high-level data exchange formats suitable for a given domain and ensuring the interoperability of Web services. In this article, we focus on the biological domain and consider four interoperability formats, BioC, BioNLP, XMI and RDF, that represent d...

متن کامل

OWS4SWAT: Publishing and Sharing SWAT Outputs with OGC standardsOWS4SWAT: Publishing and Sharing SWAT Outputs with OGC standards

The Soil and Water Assessment Tool (SWAT) is a widely used hydrological model that produces several useful outputs (e.g. evapotranspiration, soil moisture, aquifer recharge, river discharge) as text files. Currently, visualizing and publishing SWAT outputs as geospatial data requires a lot of time and repetitive processing steps. Moreover, data used and produced are often not interoperable and ...

متن کامل

Web services-based text-mining demonstrates broad impacts for interoperability and process simplification

The Critical Assessment of Information Extraction systems in Biology (BioCreAtIvE) challenge evaluation tasks collectively represent a community-wide effort to evaluate a variety of text-mining and information extraction systems applied to the biological domain. The BioCreative IV Workshop included five independent subject areas, including Track 3, which focused on named-entity recognition (NER...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Bioinformatics

دوره 32 12  شماره 

صفحات  -

تاریخ انتشار 2016